1 Overview

We are strong supporters of open-source tools for reproducible research. Priority index or Pi is developed as a genomic-led target prioritisation system, with the focus on leveraging genetic data to prioritise targets at the gene and pathway level. Based on evidence of genetic association from GWAS data, this prioritisation system is able to resolve modulated genes (seed genes) by utilising knowledge of linkage disequilibrium (co-inherited variants), distance from the gene, and evidence of genetic association with gene expression. Seed genes are scored in an integrative way quantifying the genetic influence. Scored seed genes are subsequently used as baits to rank seed genes plus additional (non-seed) genes; this is achieved by iteratively exploring the global connectivity of a gene interaction network. Genes with the top priority are further used to identify/prioritise pathways that are significantly enriched with highly prioritised genes. Prioritised genes are also used to identify a gene network interconnecting many highly prioritised genes but a few lowly prioritised genes as linkers.

In the Showcases section, we illustrate the power of Pi to perform disease-centric genetic target prioritisation at the gene, pathway and network level using several inflammatory diseases as exemplars, including Ankylosing Spondylitis, Spondyloarthritis, and Systemic Lupus Erythematosus.

In the Ontology annotations & TPN nominations section, we integrate ontology annotations and TPN nominations as an additional support for disease-specific genetic prioritisation by Pi.

In the Cross-disease comparisons section, we summarise and compare results of genetic target prioritisation across diseases, done at the gene, pathway and network level.

2 Web app

A web-based interface of Pi is available here, which allows on-the-fly prioritisation for a user-defined SNP list.

3 Workflow

4 R functions

This intends for end-users who are comfortable with R (with the latest version available at http://cran.r-project.org). Priority functions are all contained in a new package called PI which can be installed following instructions below:

# install the dependant packages including `XGR` and other packages used here
source("http://bioconductor.org/biocLite.R")
biocLite(c("XGR","devtools","VennDiagram"), siteRepos=c("http://cran.r-project.org"))

# also, install the `PI` package itself
library(devtools)
install_github(c("hfang-bristol/PI"), dependencies=T)

Priority functions are designed in a nested way. The core relation follows this route: xPrioritiserSNPs -> xPrioritiserGenes -> xPrioritiser -> xRWR, achieving gene-level prioritisation from an input list of SNPs (along with their significant level). The output of this route is taken as the input of either xPrioritiserManhattan for visualising gene priority scores, xPrioritiserPathways for prioritising pathways, or xPrioritiserSubnet for identifying a network of top prioritised genes.

4.1 xRWR

xRWR: implements Random Walk with Restart (RWR) estimating the affinity score of nodes in a graph to a list of seed nodes. The affinity score can be viewed as the influential impact over the graph that is collectively imposed by seed nodes. If the seeds are not given, it will pre-compute affinity matrix for nodes in the input graph with respect to each starting node (as a seed) by looping over every node in the graph. A higher-level function xPrioritiser directly relies on it.

4.2 xPrioritiser

xPrioritiser: uses RWR to calculate the affinity score of nodes in a graph to a list of seed nodes. A node that has a higher affinity score to seed nodes will receive a higher priority score. It is an internal function acting as a general template for RWR-based prioritisation. A higher-level function xPrioritiserGenes directly relies on it.

4.3 xPrioritiserGenes

xPrioritiserGenes: prioritises gene targets from an input gene interaction network and the score info imposed on its seed nodes/genes. This function can be considered to be a specific instance of xPrioritiser, that is, specifying a gene interaction network as a graph and seed nodes as seed genes.

There are two sources of gene interaction network information: the STRING database (Szklarczyk et al. 2015) and the Pathways Commons database (Cerami et al. 2011). STRING is a meta-integration of undirect interactions from a functional aspect, while Pathways Commons mainly contains both undirect and direct interactions from a physical/pathway aspect. In addition to interaction type, users can choose the interactions of varying quality:

Code Interaction (type and quality) Database
STRING_high Functional interactions (with high confidence scores>=700) STRING
STRING_medium Functional interactions (with medium confidence scores>=400) STRING
PCommonsUN_high Physical/undirect interactions (with references & >=2 sources) Pathways Commons
PCommonsUN_medium Physical/undirect interactions (with references & >=1 sources) Pathways Commons
PCommonsDN_high Pathway/direct interactions (with references & >=2 sources) Pathways Commons
PCommonsDN_medium Pathway/direct interactions (with references & >=1 sources) Pathways Commons

4.4 xPrioritiserSNPs

xPrioritiserSNPs: prioritises gene targets from an input gene interaction network and a given list of SNPs together with the significance level (eg GWAS reported p-values). To do so, it first defines seed genes and their scores that are calculated in an integrative manner to quantify the genetic influence under SNPs. Genetic influential score on a seed gene is calculated from the SNP score (reflecting the SNP significant level), the gene-to-SNP distance weight and the eQTL mapping weight. This function can be considered to be a specific instance of xPrioritiserGenes, that is, specifying seed genes plus their scores.

Knowledge of co-inherited variants is also used to include additional SNPs that are in Linkage Disequilibrium (LD) with GWAS lead SNPs. LD SNPs are calculated based on 1000 Genomes Project data (1000 Genomes Project Consortium 2012). LD SNPs are defined to be any SNPs having R2 > 0.8 with GWAS lead SNPs. The user can choose the population used to calculate LD SNPs:

Code Population Project
AFR African 1000 Genomes Project
AMR Admixed American 1000 Genomes Project
EAS East Asian 1000 Genomes Project
EUR European 1000 Genomes Project
SAS South Asian 1000 Genomes Project

4.5 xPrioritiserManhattan

xPrioritiserManhattan: visualises prioritised genes using manhattan plot, in which priority for genes is displayed on the Y-axis along with genomic locations on the X-axis. Also highlighted are genes with the top priority.

4.6 xPrioritiserPathways

xPrioritiserPathways: prioritises pathways based on enrichment analysis of genes with the top priority (eg top 100 genes) using a compendium of pathways. A highly prioritised pathway has its member genes with high gene-level priority scores, that is, having evidence of direct modulation by disease-associated lead (or LD) SNPs, and/or having evidence of indirect modulation at the network level.

In addition to pathways, enrichment analysis can be extended to other ontologies, as listed below:

Code Ontology Category
DO Disease Ontology Disease
GOMF Gene Ontology Molecular Function Function
GOBP Gene Ontology Biological Process Function
GOCC Gene Ontology Cellular Component Function
HPPA Human Phenotype Phenotypic Abnormality Phenotype
HPMI Human Phenotype Mode of Inheritance Phenotype
HPCM Human Phenotype Clinical Modifier Phenotype
HPMA Human Phenotype Mortality Aging Phenotype
MP Mammalian/Mouse Phenotype Phenotype
DGIdb DGI druggable gene categories Druggable
SF SCOP domain superfamilies Domain
PS2 phylostratific age information (our ancestors) Evolution
MsigdbH Hallmark gene sets Hallmark (MsigDB)
MsigdbC1 Chromosome and cytogenetic band positional gene sets Cytogenetics (MsigDB)
MsigdbC2CGP Chemical and genetic perturbation gene sets Perturbation (MsigDB)
MsigdbC2CPall All pathway gene sets Pathways (MsigDB)
MsigdbC2CP Canonical pathway gene sets Pathways (MsigDB)
MsigdbC2KEGG KEGG pathway gene sets Pathways (MsigDB)
MsigdbC2REACTOME Reactome pathway gene sets Pathways (MsigDB)
MsigdbC2BIOCARTA BioCarta pathway gene sets Pathways (MsigDB)
MsigdbC3TFT Transcription factor target gene sets TF targets (MsigDB)
MsigdbC3MIR microRNA target gene sets microRNA targets (MsigDB)
MsigdbC4CGN Cancer gene neighborhood gene sets Cancer (MsigDB)
MsigdbC4CM Cancer module gene sets Cancer (MsigDB)
MsigdbC5BP GO biological process gene sets Function (MsigDB)
MsigdbC5MF GO molecular function gene sets Function (MsigDB)
MsigdbC5CC GO cellular component gene sets Function (MsigDB)
MsigdbC6 Oncogenic signature gene sets Oncology (MsigDB)
MsigdbC7 Immunologic signature gene sets Immunology (MsigDB)

4.7 xPrioritiserSubnet

xPrioritiserSubnet: identifies a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. An iterative procedure of choosing different priority cutoff is also used to identify the network with a desired number of nodes/genes.

5 Showcases

In this section, we give a step-by-step demo of using Pi to carry out disease-specific genetic prioritisation of targets at the gene and pathway level, focusing on three inflammatory diseases as exemplars: Ankylosing Spondylitis (AS), Spondyloarthritis (SA, including AS and Psoriatic Arthritis), and Systemic Lupus Erythematosus (SLE).

First of all, load the packages including PI:

library(PI)
# the following packages are also needed
library(XGR)
library(GenomicRanges)
library(ggplot2)
library(RCircos)
library(VennDiagram)

# optionally, specify the local location of built-in RData
# by default:
RData.location <- "https://github.com/hfang-bristol/RDataCentre/blob/master/XGR/1.0.0"

5.1 Ankylosing Spondylitis

GWAS SNPs are collected mainly from GWAS Catalog (Welter et al. 2014), complemented by ImmunoBase and latest publications.

Import AS-associated GWAS lead SNPs (with the help of Anna Sanniti)

AS <- read.delim(file.path(path.package("PI"),"AS.txt"), stringsAsFactors=F)

The first 5 rows of the data frame AS are shown below, with the column SNP for AS GWAS SNPs and the column Pval for GWAS-detected P-values.

SNP Pval
rs10045403 5.8e-14
rs1018326 2.0e-06
rs10440635 3.0e-07
rs10781500 1.0e-06
rs10865331 1.9e-19

5.1.1 Gene-level prioritisation

It includes the following steps:

  1. Define seed genes based on distance-to-SNP window and genetic association with gene expression: nearby genes that are located within 200kb distance window of lead or Linkage Disequilibrium (LD) SNPs that are calculated based on European population data from 1000 Genome Project; expression associated genes (eQTL genes) for which gene expression is, either in a cis- or trans-acting manner, significantly associated with lead or LD SNPs, based on immune-stimulated eQTL data in monocytes (Fairfax et al. 2014).

  2. Score seed genes to quantify the genetic influence under lead or LD SNPs.

  3. Prioritise target genes by estimating their global network connectivity to seed genes. With scored seed genes superposed onto a gene interaction network (see above STRING_high), RWR algorithm is implemented to prioritise candidate target genes based on their network connectivity/affinity to seed genes. As such, a gene that has a higher affinity score to seed genes will receive a higher priority score.

pAS <- xPrioritiserSNPs(data=AS, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, restart=0.75, RData.location=RData.location)

The results are stored in the data frame pAS$priority, which can be saved into a file AS.genes_priority.txt:

AS_genes <- pAS$priority
write.table(AS_genes, file="AS.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
CAST 1 70.94986 0.05574 1
ERAP2 1 42.47168 0.03338 2
ERAP1 1 40.05554 0.03171 3
B3GNT2 1 23.33775 0.01838 4
LNPEP 1 22.31999 0.01767 5
IL23R 1 22.39796 0.01765 6
DDX39B 1 16.84224 0.01323 7
TBKBP1 1 15.61992 0.01230 8
NFKBIL1 1 13.79161 0.01154 9
ETS2 1 14.62834 0.01151 10
IL12RB2 1 14.34654 0.01136 11
KIF21B 1 11.32791 0.00948 12
PSMG1 1 11.96717 0.00945 13
C1orf106 1 10.42024 0.00827 14
IL6R 1 9.83556 0.00778 15
TNFRSF1A 1 8.42002 0.00690 16
GPR65 1 8.70283 0.00683 17
NPEPPS 1 8.24430 0.00648 18
KPNB1 1 8.19357 0.00645 19
NKX2-3 1 7.90464 0.00622 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_AS <- xPrioritiserManhattan(pAS, highlight.top=20, cex=0.5, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_AS)

5.1.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `AS.pathways_priority.txt`
AS_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(AS_pathways), AS_pathways)
write.table(output, file="AS.pathways_priority.txt", sep="\t", row.names=F)

The top 5 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
IL12-mediated signaling events 63 8 11.30 4.9e-10 2.3e-08 NFKB1, IL12B, TYK2, IL12RB2, IL1R1, IL2RA, NOS2, TBX21
IL27-mediated signaling events 26 5 11.20 2.2e-08 5.1e-07 IL12B, TYK2, IL12RB2, TBX21, IL27
NO2-dependent IL 12 Pathway in NK cells 17 4 11.10 9.4e-08 1.4e-06 IL12B, TYK2, IL12RB2, NOS2
IL23-mediated signaling events 37 5 9.23 2.1e-07 2.4e-06 NFKB1, IL12B, TYK2, NOS2, IL23R
Cytokine-cytokine receptor interaction 266 11 6.70 3.2e-07 2.9e-06 CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL1R1, IL2RA, LTBR, IL23R, IL1R2, CD27

This barplot displays the top 5 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- AS_pathways[1:5,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="SF", RData.location=RData.location)
AS_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 4 15.60 2.9e-09 3.2e-08 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 8.06 7.3e-06 4.0e-05 TNFRSF1A, CD27, LTBR
Immunoglobulin 481 10 4.74 4.3e-05 1.6e-04 TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL1R1, IL6R, IL12B
p53-like transcription factors 43 3 5.81 7.8e-05 2.1e-04 NFKB1, RUNX3, TBX21
Metalloproteases (zincins), catalytic domain 100 4 4.78 1.9e-04 4.2e-04 LNPEP, ERAP2, NPEPPS, ERAP1
Histone-fold 107 4 4.57 2.6e-04 4.8e-04 HIST1H4I, HIST1H3A, HIST1H3C, HIST1H2BD
Fibronectin type III 200 5 3.85 6.8e-04 1.1e-03 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 3.75 1.2e-03 1.6e-03 NFKB1, CARD9, TNFRSF1A

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="GOMF", RData.location=RData.location)
AS_gomf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 4 15.600 2.9e-09 3.2e-08 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 8.060 7.3e-06 4.0e-05 TNFRSF1A, CD27, LTBR
Immunoglobulin 481 10 4.740 4.3e-05 1.6e-04 TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL1R1, IL6R, IL12B
p53-like transcription factors 43 3 5.810 7.8e-05 2.1e-04 NFKB1, RUNX3, TBX21
Metalloproteases (zincins), catalytic domain 100 4 4.780 1.9e-04 4.2e-04 LNPEP, ERAP2, NPEPPS, ERAP1
Histone-fold 107 4 4.570 2.6e-04 4.8e-04 HIST1H4I, HIST1H3A, HIST1H3C, HIST1H2BD
Fibronectin type III 200 5 3.850 6.8e-04 1.1e-03 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 3.750 1.2e-03 1.6e-03 NFKB1, CARD9, TNFRSF1A
Homeodomain-like 294 3 1.160 7.2e-02 8.8e-02 NKX2-3, MIER1, SNAPC4
PH domain-like 420 3 0.518 1.9e-01 2.0e-01 TYK2, PLEKHG6, SH2B3

5.1.3 Network-level prioritisation

Network-level prioritisation is to identify a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. Given a gene network (the same as used in gene-level prioritisation) with nodes labelled with gene priority scores, the search for a maximum-scoring gene subnetwork is briefed as follows:

  1. score transformation, that is, given a threshold of tolerable priority scores, nodes above this threshold (nodes of interest) are scored positively, and negative scores for nodes below the threshold (intolerable).

  2. subnetwork identification, that is, to find an interconnected gene subnetwork enriched with positive-score nodes but allowing for a few negative-score nodes as linkers, done via heuristically solving prize-collecting Steiner tree problem (Fang and Gough 2014).

  3. controlling the subnetwork size, that is, an iterative procedure of scanning different priority thresholds is used to identify the network with a desired number of nodes/genes.

Notably, the preferential use of the same network as used in gene-level prioritisation is due to the fact that gene-level affinity/priority scores are smoothly distributed over the network after being walked. In other words, the chance of identifying such a gene network enriched with top prioritised genes is much higher. To reduce the runtime, by default only top 10% prioritised genes will be used to search for the maximum-scoring gene network.

# find maximum-scoring gene network with the desired node number=50
AS_net <- xPrioritiserSubnet(pNode=pAS, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- AS_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
IL12-mediated signaling events 63 11 18.90 1.5e-16 9.0e-15 NFKB1, B2M, IL12B, SOCS1, TYK2, IL12RB2, IL1R1, IL2RA, NOS2, SPHK2, TBX21
IL27-mediated signaling events 26 5 13.40 2.8e-09 8.3e-08 IL12B, TYK2, IL12RB2, TBX21, IL27
Cytokine-cytokine receptor interaction 266 11 8.40 5.6e-09 1.1e-07 CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL1R1, IL2RA, CXCR1, CXCR2, IL23R, IL1R2
NO2-dependent IL 12 Pathway in NK cells 17 4 13.30 1.7e-08 2.5e-07 IL12B, TYK2, IL12RB2, NOS2
IL23-mediated signaling events 37 5 11.10 2.7e-08 3.2e-07 NFKB1, IL12B, TYK2, NOS2, IL23R
Jak-STAT signaling pathway 155 8 8.17 6.9e-08 6.1e-07 CSF2RB, IL12B, SOCS1, IL6R, TYK2, IL12RB2, IL2RA, IL23R
Cytokine Signaling in Immune system 268 10 7.49 7.1e-08 6.1e-07 CSF2RB, ICAM1, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, IL1R2, UBA52
Hematopoietic cell lineage 87 6 8.37 2.5e-07 1.9e-06 ITGB3, IL6R, IL1R1, IL2RA, IL1R2, CD9
Signaling by Interleukins 106 6 7.46 9.6e-07 6.4e-06 CSF2RB, IL6R, TYK2, IL1R1, IL2RA, IL1R2
Immune System 912 16 5.54 1.1e-06 6.6e-06 CSF2RB, ICAM1, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, ICAM3, CDC27, IL1R2, LNPEP, UBA52, ICAM4, ERAP1, NPEPPS

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
Neutral zinc metallopeptidase 182 4 3.76 0.00084 0.0023 ERAP2, LNPEP, NPEPPS, ERAP1
Cell surface 467 7 3.74 0.00047 0.0023 TNFRSF1A, B2M, SCNN1A, ICAM1, CD9, IL1R1, CXCR2
External side of plasma membrane 189 4 3.66 0.00099 0.0023 B2M, SCNN1A, ICAM1, CD9
Drug resistance 349 4 2.15 0.01400 0.0240 B2M, ICAM1, SOCS1, PDE4A
Tumor suppressor 716 6 1.83 0.02500 0.0350 UBA52, IL12B, CDC27, RHOA, CDC37, RUNX3

5.2 Spondyloarthritis

Import SA-associated GWAS lead SNPs (with the help of Anna Sanniti and Alicia Lledo Lara)

SA <- read.delim(file.path(path.package("PI"),"Spondyloarthritis.txt"), stringsAsFactors=F)

5.2.1 Gene-level prioritisation

pSA <- xPrioritiserSNPs(data=SA, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSA$priority, which can be saved into a file SA.genes_priority.txt:

SA_genes <- pSA$priority
write.table(SA_genes, file="SA.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
DDX39B 1 91.60697 0.02378 1
NFKBIL1 1 75.01421 0.02074 2
CAST 1 70.94986 0.01842 3
ATF6B 1 62.40788 0.01621 4
TRAF3IP2 1 50.87129 0.01321 5
ERAP2 1 42.47168 0.01103 6
ERAP1 1 40.05554 0.01048 7
TNIP1 1 37.35657 0.00978 8
ANXA6 1 35.85679 0.00932 9
LCE3D 1 28.05582 0.00912 10
IER3 1 34.61371 0.00898 11
LCE3A 1 27.80789 0.00858 12
LCE3B 1 27.93689 0.00830 13
LCE3C 1 24.05016 0.00800 14
TYK2 1 28.70260 0.00756 15
LCE3E 1 24.55941 0.00749 16
MDC1 1 28.35607 0.00737 17
VARS2 1 25.92044 0.00673 18
ICAM3 1 24.84533 0.00662 19
B3GNT2 1 23.33775 0.00607 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SA <- xPrioritiserManhattan(pSA, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SA)

5.2.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SA.pathways_priority.txt`
SA_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SA_pathways), SA_pathways)
write.table(output, file="SA.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
IL27-mediated signaling events 26 4 9.64 4.4e-07 7.9e-06 STAT2, IL12B, TYK2, IL12RB2
Jak-STAT signaling pathway 155 8 7.30 3.6e-07 7.9e-06 STAT2, IL12B, IL6R, TYK2, IL12RB2, CTF1, IL23A, IL23R
Beta2 integrin cell surface interactions 29 4 9.09 7.8e-07 9.3e-06 ICAM1, ITGAM, ICAM3, ICAM4
NO2-dependent IL 12 Pathway in NK cells 17 3 8.98 2.9e-06 2.1e-05 IL12B, TYK2, IL12RB2
IL23-mediated signaling events 37 4 7.94 2.7e-06 2.1e-05 IL12B, TYK2, IL23A, IL23R
IL 6 signaling pathway 22 3 7.82 8.7e-06 5.2e-05 CSNK2A1, IL6R, TYK2
IL12 and Stat4 Dependent Signaling Pathway in Th1 Development 23 3 7.63 1.0e-05 5.4e-05 IL12B, TYK2, IL12RB2
Allograft rejection 37 3 5.84 7.3e-05 2.9e-04 IL12B, HLA-E, HLA-DQA1
Viral myocarditis 71 4 5.43 7.0e-05 2.9e-04 FYN, ICAM1, HLA-E, HLA-DQA1
amb2 Integrin signaling 41 3 5.50 1.1e-04 3.8e-04 FYN, ICAM1, ITGAM

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SA_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="SF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Leukotriene A4 hydrolase N-terminal domain 12 3 12.30 2.3e-07 2.1e-06 LNPEP, ERAP2, ERAP1
Immunoglobulin 481 8 3.83 4.4e-04 2.0e-03 HLA-DQA1, HLA-E, ICAM1, ICAM3, ICAM4, ICAM5, IL6R, IL12B
Metalloproteases (zincins), catalytic domain 100 3 3.66 1.3e-03 4.0e-03 LNPEP, ERAP2, ERAP1
SH2 domain 112 3 3.38 2.0e-03 4.5e-03 TYK2, STAT2, FYN
Fibronectin type III 200 4 3.14 2.7e-03 4.8e-03 IL6R, IL12B, IL12RB2, IL23R
RNA-binding domain, RBD 251 3 1.66 3.2e-02 4.8e-02 RAVER1, SETD1A, NELFE

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="GOMF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
aminopeptidase activity 28 3 7.50 1.3e-05 0.00025 ERAP2, LNPEP, ERAP1
integrin binding 105 4 4.70 2.2e-04 0.00210 ICAM1, ICAM3, ICAM4, ICAM5
ATP-dependent RNA helicase activity 63 3 4.69 3.2e-04 0.00210 DHX30, SKIV2L, DDX39B
RNA binding 404 6 2.73 5.0e-03 0.02500 RPL34, SETD1A, NELFE, ILF3, SKIV2L, PUS10
cytokine activity 172 3 2.24 1.2e-02 0.04900 IL12B, IL23A, CTF1
nucleotide binding 304 4 1.95 2.1e-02 0.07000 RAVER1, REV3L, SETD1A, NELFE
poly(A) RNA binding 1118 10 1.81 2.9e-02 0.07700 MRPL4, RAVER1, NELFE, DHX30, SSRP1, XRCC6, ILF3, CAST, DDX39B, CS
signal transducer activity 227 3 1.69 3.1e-02 0.07700 STAT2, SLC44A2, KCNH7
enzyme binding 309 3 1.11 7.7e-02 0.16000 KAT8, IL6R, DNM2
receptor binding 321 3 1.04 8.6e-02 0.16000 HLA-E, ICAM3, APOF

5.2.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SA_net <- xPrioritiserSubnet(pNode=pSA, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SA_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Beta2 integrin cell surface interactions 29 7 17.60 1.2e-12 1.0e-10 ICAM1, ITGAL, ITGAX, ITGAM, ICAM3, FCGR2A, ICAM4
Immune System 912 22 8.34 1.5e-11 6.4e-10 NFKBIA, ITGB1, KEAP1, RPS6KA1, STAT2, FYN, ICAM1, ITGAL, TNFAIP3, IL6R, TYK2, IL1R1, PSMA6, ICAM3, RNF41, LNPEP, UBA52, REL, SQSTM1, ICAM4, ERAP1, NPEPPS
Integrin cell surface interactions 79 8 11.90 2.1e-10 5.9e-09 ITGB1, ICAM1, ITGAL, ITGAX, ITGB3, ITGAM, ICAM3, ICAM4
Jak-STAT signaling pathway 155 10 10.30 2.9e-10 6.2e-09 IL4, STAT2, IL12B, IL13, IL6R, TYK2, IL12RB2, CTF1, IL23A, IL23R
IL23-mediated signaling events 37 6 13.20 6.2e-10 1.1e-08 NFKBIA, IL12B, TYK2, NOS2, IL23A, IL23R
IL12-mediated signaling events 63 7 11.70 9.4e-10 1.4e-08 IL4, IL12B, TYK2, IL12RB2, IL1R1, NOS2, TBX21
Monocyte and its Surface Molecules 11 4 16.50 1.4e-09 1.8e-08 ITGB1, ICAM1, ITGAL, ITGAM
IL27-mediated signaling events 26 5 13.20 3.2e-09 2.7e-08 STAT2, IL12B, TYK2, IL12RB2, TBX21
Integrin family cell surface interactions 26 5 13.20 3.2e-09 2.7e-08 ITGB1, ITGAL, ITGAX, ITGB3, ITGAM
Leishmania infection 72 7 10.80 2.8e-09 2.7e-08 NFKBIA, IL4, ITGB1, IL12B, ITGAM, NOS2, FCGR2A

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
External side of plasma membrane 189 7 6.60 1.6e-06 1.6e-05 SCNN1A, ITGAL, ITGAX, ITGB1, ICAM1, IL4, IL13
Cell surface 467 10 5.50 4.1e-06 2.0e-05 TNFRSF1A, SCNN1A, STX4, ITGAL, ITGAX, ITGB1, ICAM1, IL1R1, IL4, IL13
Tumor suppressor 716 8 2.70 4.2e-03 1.4e-02 ITGB1, TNFAIP3, UBA52, IL12B, CSNK2A1, PSMA6, CDC37, RUNX3
Protease inhibitor 179 3 2.40 9.2e-03 1.9e-02 TNFAIP3, RPS6KA1, CAST
Neutral zinc metallopeptidase 182 3 2.36 9.7e-03 1.9e-02 LNPEP, NPEPPS, ERAP1

5.3 Systemic Lupus Erythematosus

Import SLE-associated GWAS lead SNPs

SLE <- read.delim(file.path(path.package("PI"),"SLE.txt"), stringsAsFactors=F)

5.3.1 Gene-level prioritisation

pSLE <- xPrioritiserSNPs(data=SLE, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSLE$priority, which can be saved into a file SLE.genes_priority.txt:

SLE_genes <- pSLE$priority
write.table(SLE_genes, file="SLE.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
ATF6B 1 219.97756 0.03584 1
NELFE 1 158.67378 0.02596 2
VARS2 1 132.00642 0.02149 3
ITGAM 1 128.93662 0.02123 4
PBX2 1 119.14179 0.01948 5
IRF5 1 118.38794 0.01938 6
STAT4 1 117.28679 0.01930 7
TNPO3 1 104.72725 0.01708 8
CUEDC2 1 99.35864 0.01616 9
HLA-DPB1 1 88.83311 0.01481 10
RNF114 1 80.66186 0.01319 11
CLIC1 1 79.67787 0.01301 12
PSMB9 1 78.94596 0.01287 13
C2 1 74.40265 0.01216 14
NFKBIL1 1 69.84825 0.01211 15
PYCARD 1 72.72229 0.01192 16
LTA 1 62.00410 0.01068 17
SKIV2L 1 62.27032 0.01046 18
HNRNPM 1 60.48517 0.00989 19
ITGAX 1 56.59653 0.00955 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SLE <- xPrioritiserManhattan(pSLE, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SLE)

5.3.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SLE.pathways_priority.txt`
SLE_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SLE_pathways), SLE_pathways)
write.table(output, file="SLE.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Beta2 integrin cell surface interactions 29 5 10.60 4.1e-08 2.2e-06 ITGAX, ITGAM, ICAM3, FCGR2A, ITGAD
Type I diabetes mellitus 43 5 8.57 4.8e-07 1.2e-05 CD80, LTA, HLA-DOB, HLA-DPB1, HLA-DQA1
Leishmania infection 72 6 7.76 6.4e-07 1.2e-05 STAT1, ITGAM, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A
Initial triggering of complement 13 3 9.63 1.5e-06 1.7e-05 C2, C4A, CFB
Regulation of Complement cascade 13 3 9.63 1.5e-06 1.7e-05 C2, C4A, CFB
IL22 Soluble Receptor Signaling Pathway 16 3 8.62 3.9e-06 3.4e-05 STAT1, STAT4, TYK2
Allograft rejection 37 4 7.35 5.5e-06 3.4e-05 CD80, HLA-DOB, HLA-DPB1, HLA-DQA1
Interferon gamma signaling 62 5 6.94 4.4e-06 3.4e-05 STAT1, HLA-DPB1, HLA-DQA1, IRF5, IRF8
Systemic lupus erythematosus 139 7 6.14 5.3e-06 3.4e-05 CD80, C2, C4A, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A
Complement Pathway 19 3 7.85 8.1e-06 4.4e-05 C2, C4A, CFB

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SLE_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="SF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
vWA-like 95 6 7.11 1.9e-06 2.5e-05 CFB, C2, ITGAD, ITGAM, ITGAX, XRCC6
Integrin alpha N-terminal domain 23 3 7.61 1.1e-05 6.7e-05 ITGAD, ITGAM, ITGAX
Integrin domains 25 3 7.27 1.6e-05 6.7e-05 ITGAD, ITGAM, ITGAX
MHC antigen-recognition domain 40 3 5.56 1.0e-04 3.4e-04 HLA-DOB, HLA-DPB1, HLA-DQA1
TNF-like 53 3 4.69 3.2e-04 8.2e-04 LTB, TNFSF4, LTA
SH2 domain 112 4 4.00 6.4e-04 1.4e-03 TYK2, STAT1, STAT4, BLK
Trypsin-like serine proteases 120 3 2.64 6.5e-03 1.2e-02 CFB, PRSS8, C2
Immunoglobulin 481 6 1.79 2.9e-02 4.7e-02 HLA-DOB, HLA-DPB1, HLA-DQA1, ICAM3, CD80, FCGR2A

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="GOMF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
tumor necrosis factor receptor binding 30 4 9.30 6.7e-07 1.9e-05 STAT1, LTA, LTB, TNFSF4
ATP-dependent RNA helicase activity 63 3 4.43 4.5e-04 6.3e-03 DHX30, SKIV2L, DDX39B
receptor binding 321 6 3.13 2.3e-03 1.6e-02 HSPA1A, ICAM3, LTA, BLK, LTB, TNFSF4
sequence-specific DNA binding transcription factor activity 825 11 3.00 2.4e-03 1.6e-02 ZNF668, IKZF1, PTTG1, MECP2, ETS1, ATF6B, PBX2, PHTF1, STAT1, STAT4, IRF5
actin filament binding 113 3 2.96 3.9e-03 1.9e-02 AIF1, MYO1B, FLNC
transcription factor binding 266 5 2.87 4.1e-03 1.9e-02 MECP2, ETS1, PBX2, GPX3, KAT8
enzyme binding 309 5 2.48 8.3e-03 3.3e-02 STAT1, UBE2L3, TSPAN33, PYCARD, KAT8
serine-type endopeptidase activity 154 3 2.29 1.2e-02 4.0e-02 PRSS8, C2, CFB
cytokine activity 172 3 2.06 1.7e-02 5.2e-02 LTA, LTB, TNFSF4
ligase activity 283 4 1.91 2.3e-02 6.1e-02 RNF114, UBE2L3, PPIL2, TNFAIP3

5.3.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SLE_net <- xPrioritiserSubnet(pNode=pSLE, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SLE_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Immune System 912 24 9.46 8.1e-14 9.1e-12 CD80, HRAS, STAT1, ICAM1, FCGR2B, CSK, CDKN1B, SOCS1, IRAK1, TYK2, CD44, SKP1, RASGRP3, ICAM3, IRF7, PYCARD, FCGR3A, RASGRP1, UBA52, IRF5, BLK, IRF8, SQSTM1, ICAM4
Beta2 integrin cell surface interactions 29 7 17.80 9.9e-13 5.6e-11 ICAM1, ITGAX, ITGAM, ICAM3, FCGR2A, ITGAD, ICAM4
Cytokine Signaling in Immune system 268 13 10.10 3.2e-11 1.2e-09 HRAS, STAT1, ICAM1, SOCS1, IRAK1, TYK2, CD44, SKP1, IRF7, UBA52, IRF5, IRF8, SQSTM1
Interferon gamma signaling 62 7 11.90 6.9e-10 1.9e-08 STAT1, ICAM1, SOCS1, CD44, IRF7, IRF5, IRF8
Leishmania infection 72 7 11.00 2.3e-09 5.1e-08 IL10, STAT1, TNF, ITGAM, IRAK1, FCGR2A, FCGR3A
Adaptive Immune System 524 15 7.76 2.7e-09 5.1e-08 CD80, HRAS, ICAM1, FCGR2B, CSK, CDKN1B, SOCS1, SKP1, RASGRP3, ICAM3, FCGR3A, RASGRP1, UBA52, BLK, ICAM4
Integrin cell surface interactions 79 7 10.40 5.0e-09 7.1e-08 ICAM1, CSK, ITGAX, ITGAM, ICAM3, RASGRP1, ICAM4
Interferon Signaling 158 9 9.20 5.0e-09 7.1e-08 STAT1, ICAM1, SOCS1, TYK2, CD44, IRF7, UBA52, IRF5, IRF8
IL-10 Anti-inflammatory Signaling Pathway 17 4 13.30 1.7e-08 2.1e-07 IL10, STAT1, STAT4, TNF
Interferon alpha/beta signaling 64 6 9.96 2.9e-08 3.2e-07 STAT1, SOCS1, TYK2, IRF7, IRF5, IRF8

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
External side of plasma membrane 189 6 5.06 4.9e-05 0.00059 TNF, ITGAX, ICAM1, CD80, FCGR3A, CD44
Clinically actionable 237 6 4.30 2.0e-04 0.00110 CDKN1B, SOCS1, TP53, HRAS, PRDM1, STAT4
Histone modification 249 6 4.14 2.8e-04 0.00110 TP53, CDK9, HCFC1, KAT8, MECP2, SKP1
Drug resistance 349 6 3.13 2.1e-03 0.00620 TNF, ICAM1, SOCS1, TP53, IL10, GPX3
Tumor suppressor 716 9 2.82 3.2e-03 0.00760 TNF, CDKN1B, TP53, HRAS, CDK9, UBA52, HCFC1, ETS1, CDC37

5.4 Dupuytren’s disease (DD)

Import DD-associated GWAS lead SNPs

DD <- read.delim("~/Sites/XGR/Dupuytren.SNPs.txt", stringsAsFactors=F)

The 13 DD-associated SNPs are:

SNPs Pvalue
rs16879765 6e-39
rs6519955 3e-33
rs611744 8e-15
rs11672517 7e-14
rs2912522 2e-13
rs8124695 8e-10
rs7524102 3e-09
rs10809650 6e-09
rs4730775 3e-08
rs2179367 3e-07
rs4789939 6e-07
rs4932194 8e-07
rs12032381 6e-06

5.4.1 Gene-level prioritisation

pDD <- xPrioritiserSNPs(data=DD, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, restart=0.75, RData.location=RData.location)

The results are stored in the data frame pDD$priority, which can be saved into a file DD.genes_priority.txt:

DD_genes <- pDD$priority
write.table(DD_genes, file="DD.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
PRR34 1 23.71986 0.10881 1
SFRP4 1 24.36136 0.10538 2
NME8 1 20.14126 0.08687 3
WNT7B 1 17.54777 0.07620 4
EIF3E 1 9.79590 0.04216 5
DUXA 1 8.85389 0.03895 6
RSPO2 1 9.01925 0.03893 7
ZIM3 1 6.90278 0.03296 8
PPARA 1 7.48707 0.03227 9
USP29 1 5.92433 0.02925 10
BPIFC 0 0.00000 0.02720 11
AURKC 1 4.23335 0.01824 12
ATXN10 1 3.84695 0.01663 13
WNT2 1 3.22187 0.01475 14
STARD3NL 1 3.31155 0.01436 15
TIMP2 1 3.05084 0.01316 16
ZNF460 1 2.39640 0.01033 17
ZC3H12D 1 2.06816 0.00890 18
ZBTB40 1 1.91320 0.00837 19
NUP43 1 1.82247 0.00785 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_DD <- xPrioritiserManhattan(pDD, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_DD)

5.4.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `DD.pathways_priority.txt`
DD_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(DD_pathways), DD_pathways)
write.table(output, file="DD.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Basal cell carcinoma 55 27 43.0 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Melanogenesis 101 27 31.4 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Wnt signaling pathway 151 30 28.3 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK2, DKK1
Class B/2 (Secretin family receptors) 87 25 31.4 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes related to Wnt-mediated signal transduction 89 25 31.0 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT11, WNT2B, FZD4, WNT9A, FZD6, FZD7, FZD2, FZD8, WNT10A, WNT3A, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK1
Wnt signaling network 28 15 33.4 0.0e+00 0.0e+00 FZD1, WNT2, WNT3, WNT5A, WNT7A, WNT7B, FZD4, FZD6, FZD7, FZD2, FZD8, WNT3A, FZD10, FZD5, DKK1
Hedgehog signaling pathway 56 18 28.2 0.0e+00 0.0e+00 WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B
Pathways in cancer 325 27 16.7 0.0e+00 0.0e+00 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
GPCR ligand binding 400 25 13.5 2.8e-19 6.3e-19 FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes encoding secreted soluble factors 344 20 11.5 6.8e-15 1.4e-14 WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B, SFRP4, MEGF6

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- DD_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="SF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap zscore pvalue adjp members
Frizzled cysteine-rich domain 20 10 31.30 0.0e+00 3.0e-20 SFRP4, FZD1, FZD4, FZD6, FZD7, FZD8, FZD10, FZD3, FZD5, FZD2
Thioredoxin-like 127 10 11.80 3.0e-11 1.5e-10 TXNL1, TXNRD3, PRDX2, NME8, PRDX3, TXNDC2, PRDX5, TXNDC8, TXN2, PRDX4
Galactose-binding domain-like 75 3 4.30 5.4e-04 1.8e-03 EPHA8, MFGE8, TXNL1
Growth factor receptor domain 127 3 2.98 3.8e-03 9.4e-03 RSPO4, MEGF6, RSPO2
Cysteine proteinases 149 3 2.62 6.6e-03 1.3e-02 USP5, USP29, USP36
KRAB domain (Kruppel-associated box) 356 4 1.68 3.3e-02 5.5e-02 ZIM3, ZNF543, ZNF460, ZNF304

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="GOMF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap zscore pvalue adjp members
frizzled binding 37 20 42.00 0.0e+00 0.0e+00 WNT5A, WNT3A, WNT3, WNT7A, WNT5B, FZD1, FZD7, WNT4, WNT11, WNT2, WNT8A, WNT10B, WNT6, WNT7B, WNT8B, WNT10A, WNT2B, WNT9A, WNT9B, WNT16
Wnt-activated receptor activity 21 10 27.80 7.0e-20 8.2e-19 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD3, FZD10
Wnt-protein binding 28 10 24.00 4.0e-18 3.2e-17 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD3, FZD10
receptor agonist activity 15 8 26.40 3.4e-17 2.0e-16 WNT5A, WNT3A, WNT3, WNT7A, WNT4, WNT2, WNT8A, WNT10B
protein disulfide oxidoreductase activity 23 5 13.10 3.8e-09 1.8e-08 TXNRD3, TXN2, TXNDC2, TXNDC8, TXNL1
PDZ domain binding 86 6 7.65 8.4e-07 3.4e-06 FZD4, FZD1, FZD8, FZD2, FZD7, FZD3
microtubule motor activity 68 3 4.06 7.5e-04 2.6e-03 DNAH11, DNAH5, DNAI2
G-protein coupled receptor activity 643 10 3.18 1.6e-03 4.9e-03 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD3, FZD10
ubiquitin-specific protease activity 90 3 3.35 2.1e-03 5.7e-03 USP29, USP5, USP36
cysteine-type endopeptidase activity 94 3 3.25 2.5e-03 6.0e-03 USP29, USP5, USP36

5.4.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=20
DD_net <- xPrioritiserSubnet(pNode=pDD, priority.quantite=0.1, subnet.size=21, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- DD_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap zscore pvalue adjp members
Basal cell carcinoma 55 11 30.10 0.0e+00 1.0e-20 WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10
Melanogenesis 101 12 24.10 1.0e-20 7.0e-20 CREBBP, WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10
Wnt signaling pathway 151 13 21.20 1.0e-20 1.0e-19 CREBBP, WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10, SFRP4
Class B/2 (Secretin family receptors) 87 11 23.80 1.1e-19 7.1e-19 WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10
Genes related to Wnt-mediated signal transduction 89 10 21.30 2.0e-17 1.0e-16 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, SFRP4
Wnt signaling network 28 7 26.80 7.1e-16 3.0e-15 WNT2, WNT7B, FZD4, FZD6, FZD7, FZD8, FZD10
Pathways in cancer 325 12 13.00 3.1e-14 1.1e-13 CREBBP, WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10
GPCR ligand binding 400 11 10.50 1.4e-11 4.4e-11 WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10
Hedgehog signaling pathway 56 6 16.10 3.2e-11 8.9e-11 WNT2, WNT6, WNT7B, WNT10B, WNT11, WNT10A
Signaling by GPCR 906 11 6.34 1.7e-07 4.2e-07 WNT2, WNT6, WNT7B, WNT10B, WNT11, FZD4, FZD6, FZD7, FZD8, WNT10A, FZD10

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 3 significant druggable categories are shown below:

name nAnno nOverlap zscore pvalue adjp members
Cell surface 467 5 3.78 0.0005 0.002 SFRP4, FZD10, WNT6, FZD4, RSPO2
G protein coupled receptor 855 5 2.19 0.0110 0.023 FZD10, FZD4, FZD6, FZD7, FZD8
Tumor suppressor 716 4 1.83 0.0230 0.031 UBA52, AURKC, PLK1, WNT10B

6 Ontology annotations & TPN nominations

In this section, we first describe annotation predictors using ontologies and target genes nominated by ULTRA-DD TPN members, and then integrate them as an additional support for disease-specific genetic prioritisation by Pi.

6.1 Annotation predictors

Using ontologies to annotate genes is one of the most effective ways to reuse existing knowledge. Also it is a scalable way to capture a particular knowledge sphere in an systematic way. An ontology is like a dictionary, containing vocabularies (called terms) and their relations. These terms and relations are understandable by humans, and readable by computers. According to how to organise relationships between terms, there are types of ontologies:

  1. structured ontology: terms are organised in a tree-like structure, such as Gene Ontology (Ashburner et al. 2000), Disease Ontology (Schriml et al. 2012), and Phenotype Ontologies in human and mouse (Köhler et al. 2013; C. L. Smith and Eppig 2009);
  2. non-structured ontology: terms are not organised as tree but simply listed as keywords, such as a collection of pathways, and gene druggable categories.

When integrating ontologies of different knowledge spheres, the use is limited by the heterogeneous nature: different terms/keywords represent the same or similar knowledge. For this reason, they are better used in a broader context to capture a general category of knowledge, eg a group of diseases instead of a specific disease. We consider the context broadly relevant to immune-related diseases. We choose 5 annotation predictors as informative carriers generalising their relatedness to immunity/inflammation diseases. For a given gene, we look at:

  1. OMIM evidence: whether it was identified in OMIM as an immune disease ontology gene;
  2. Phenotype evidence: whether it is annotated to immune-related phenotypes from Human Phenotype Ontology (HPO) and/or Mouse Phenotype Ontology (MPO);
  3. Function evidence: whether it is annotated to immune/inflammatory terms from Gene Ontology;
  4. Pathway evidence: whether it is annotated to immune-related pathways (from KEGG, BioCarta and Reactome);
  5. Druggable evidence: whether it is annotated to gene druggable categories from DGIdb.

Since the above predictors are relatively independent to each other, additive scoring scheme is used to calculate annotation scores per gene.

Import annotation predictors (pre-computed)

iAnno <- read.delim(file.path(path.package("PI"),"iAnno.txt"), stringsAsFactors=F)
# gene/target info
iSymbol <- iAnno[,2]
iGenes <- iAnno[,1:3]
# predictor info (including individual scores)
iPS <- iAnno[,22:26]
# additive predictor score
PS <- apply(iPS, 1, sum)
# combine additive predictor score and individual acores
iPS <- cbind(PS=PS, iPS)

The first 5 rows of the data frame iGenes are:

Target.GeneID Target.Symbol Target.Name
1 A1BG alpha-1-B glycoprotein
2 A2M alpha-2-macroglobulin
3 A2MP1 alpha-2-macroglobulin pseudogene 1
9 NAT1 N-acetyltransferase 1 (arylamine N-acetyltransferase)
10 NAT2 N-acetyltransferase 2 (arylamine N-acetyltransferase)

Their predictor info is stored in the data frame iPS, with the column PS for additive predictor scores:

PS OMIM Phenotype Function Pathway Druggable
0 0 0 0 0 0
1 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0

Distributions of annotation predictor scores

df <- iPS
p <- ggplot(df, aes(PS))
p + geom_bar(fill="deepskyblue") + scale_y_log10() + geom_text(stat="bin",binwidth=1,aes(label=..count..),vjust=0,color="blue") + ylab("Number of genes (predicted)") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + scale_x_continuous(breaks=0:max(df$PS)) + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue"))

Pairwise correlations between predictors

df <- df[df$PS>0, -1]
res <- supraHex::sDistance(t(df), metric="pearson")
diag(res) <- 1
cellnote <- signif(1-res, digits=2)
diag(cellnote) <- NA
visHeatmapAdv(1-res, Rowv=F, Colv=F, colormap="bwr", zlim=c(-1,1), key=F, cexRow=1, cexCol=1, cellnote=cellnote, notecex=0.8, notecol="black", lhei=c(1,5), lwid=c(1,5))

6.2 TPN nominations

# TPN nomination info
iTPN <- iAnno[,9:15]
# flag whether nominated by ULTRA-DD TPN members
iNominated <- iAnno[,9]
# info provided by SGC Oxford
iSGC <- iAnno[,4:8]
# iSGC restricted to nominated targets
df <- iSGC[iNominated>0, ]
df$SGC_PI <- !(df$SGC_PI %in% c("Unassigned","Unassigned - TO",""))

Gene family analysis

Below show results of family analysis (provided by Brian and David).

The top family is integral membrane proteins (IMP), followed by histone proteins (histone lysine demethylase; KDM) and ubiquitin-related proteins. Other top families include DENN domain proteins.

Allocations to SGC PIs

As seen below, 714 genes with 490 (68%) have been allocation to specific SGC-Oxford PIs.

SGC PI allocations are conditioned on SG/ChemBio Tractability:

Generally speaking, targets with good tractability are more likely to be allocated to SGC PIs.

Distributions of annotation predictor scores for nominated genes

Distributions of annotation predictor scores are conditioned on whether genes are nominated or not:

It can be seen that nominated genes tend to receive a higher annotation scores than non-nominated genes do.

6.3 Integration for Ankylosing Spondylitis

Recall: AS genes prioritised by Pi are stored in the data frame AS_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(AS_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine AS_genes iPS iGenes iNominated
AS_combined <- cbind(AS_genes[,c(1,4)], genes_ap)

The top 20 AS-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
CAST 0.05574 1 0 0 0 0 1 0
ERAP2 0.03338 0 0 0 0 0 0 0
ERAP1 0.03171 2 0 0 1 1 0 0
B3GNT2 0.01838 0 0 0 0 0 0 0
LNPEP 0.01767 1 0 0 0 1 0 0
IL23R 0.01765 3 1 1 1 0 0 0
DDX39B 0.01323 1 0 0 0 0 1 0
TBKBP1 0.01230 0 0 0 0 0 0 0
NFKBIL1 0.01154 1 1 0 0 0 0 0
ETS2 0.01151 0 0 0 0 0 0 0
IL12RB2 0.01136 1 0 1 0 0 0 0
KIF21B 0.00948 0 0 0 0 0 0 0
PSMG1 0.00945 0 0 0 0 0 0 0
C1orf106 0.00827 0 0 0 0 0 0 0
IL6R 0.00778 5 1 1 1 1 1 0
TNFRSF1A 0.00690 4 1 1 1 0 1 1
GPR65 0.00683 1 0 0 1 0 0 1
NPEPPS 0.00648 2 0 0 0 1 1 0
KPNB1 0.00645 1 0 0 0 1 0 0
NKX2-3 0.00622 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving AS-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(AS_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(AS_combined$priority,1) # genetic priority at the top
p <- ggplot(AS_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("AS-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 39 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 117 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone. The origin of the wording a2maid is explained here.1
  • Top-left: 599 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to AS (maybe other immune diseases).

The 39 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
IL6R interleukin 6 receptor 0.00778 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00690 4 1
CARD9 caspase recruitment domain family, member 9 0.00593 4 0
TYK2 tyrosine kinase 2 0.00432 4 0
IFIH1 interferon induced with helicase C domain 1 0.00342 4 0
CD27 CD27 molecule 0.00315 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00261 4 0
IL12B interleukin 12B 0.00245 5 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00244 4 0
IL2RA interleukin 2 receptor, alpha 0.00243 5 1
ICAM1 intercellular adhesion molecule 1 0.00173 4 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00158 4 0
ADAR adenosine deaminase, RNA-specific 0.00149 4 1
AIRE autoimmune regulator 0.00109 4 0
NOTCH1 notch 1 0.00077 4 0
HLA-B major histocompatibility complex, class I, B 0.00071 4 0
HLA-A major histocompatibility complex, class I, A 0.00064 5 0
B2M beta-2-microglobulin 0.00058 5 0
MAPK1 mitogen-activated protein kinase 1 0.00034 4 0
IL12A interleukin 12A 0.00022 4 0
IFNG interferon, gamma 0.00021 4 0
STAT4 signal transducer and activator of transcription 4 0.00021 4 0
JAK2 Janus kinase 2 0.00019 5 0
IL1B interleukin 1, beta 0.00016 4 0
IL1RN interleukin 1 receptor antagonist 0.00014 5 0
IL2 interleukin 2 0.00014 4 0
IL10 interleukin 10 0.00014 5 0
JAK3 Janus kinase 3 0.00013 4 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00013 5 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00013 4 0
MALT1 MALT1 paracaspase 0.00013 4 0
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00012 5 0
PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) 0.00011 5 0
CBL Cbl proto-oncogene, E3 ubiquitin protein ligase 0.00011 4 0
IL6 interleukin 6 0.00011 5 0
PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha 0.00010 4 1
CD3E CD3e molecule, epsilon (CD3-TCR complex) 0.00010 5 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00009 5 0
BCL10 B-cell CLL/lymphoma 10 0.00009 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ERAP1 endoplasmic reticulum aminopeptidase 1 0.03171 2 0
IL23R interleukin 23 receptor 0.01765 3 0
NPEPPS aminopeptidase puromycin sensitive 0.00648 2 0
CACNA1S calcium channel, voltage-dependent, L type, alpha 1S subunit 0.00579 2 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 0.00491 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00384 2 0
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 0.00381 3 0
TBX21 T-box 21 0.00379 2 0
IL1R1 interleukin 1 receptor, type I 0.00375 3 0
NOS2 nitric oxide synthase 2, inducible 0.00372 3 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
CAST calpastatin 0.05574 1 0
ERAP2 endoplasmic reticulum aminopeptidase 2 0.03338 0 0
B3GNT2 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 0.01838 0 0
LNPEP leucyl/cystinyl aminopeptidase 0.01767 1 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.01323 1 0

Label the gene network with annotation prediction

The nodes/genes in AS-specific gene network are color-coded by annotation additive predictor scores.

g <- AS_net
evi <- AS_combined[,3]
names(evi) <- rownames(AS_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- AS_net
evi <- AS_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.4 Integration for Spondyloarthritis

Recall: SA genes prioritised by Pi are stored in the data frame SA_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SA_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SA_genes iPS iGenes iNominated
SA_combined <- cbind(SA_genes[,c(1,4)], genes_ap)

The top 20 SA-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
DDX39B 0.02378 1 0 0 0 0 1 0
NFKBIL1 0.02074 1 1 0 0 0 0 0
CAST 0.01842 1 0 0 0 0 1 0
ATF6B 0.01621 0 0 0 0 0 0 0
TRAF3IP2 0.01321 1 0 1 0 0 0 0
ERAP2 0.01103 0 0 0 0 0 0 0
ERAP1 0.01048 2 0 0 1 1 0 0
TNIP1 0.00978 0 0 0 0 0 0 0
ANXA6 0.00932 0 0 0 0 0 0 0
LCE3D 0.00912 0 0 0 0 0 0 0
IER3 0.00898 0 0 0 0 0 0 0
LCE3A 0.00858 0 0 0 0 0 0 0
LCE3B 0.00830 0 0 0 0 0 0 0
LCE3C 0.00800 0 0 0 0 0 0 0
TYK2 0.00756 4 1 1 0 1 1 0
LCE3E 0.00749 0 0 0 0 0 0 0
MDC1 0.00737 0 0 0 0 0 0 0
VARS2 0.00673 0 0 0 0 0 0 1
ICAM3 0.00662 2 0 0 1 1 0 0
B3GNT2 0.00607 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SA-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SA_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SA_combined$priority,1) # genetic priority at the top
p <- ggplot(SA_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("Spondyloarthritis-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 43 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 105 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 607 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to Spondyloarthritis (maybe other immune diseases).

The 43 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
TYK2 tyrosine kinase 2 0.00756 4 0
IL12B interleukin 12B 0.00557 5 0
ICAM1 intercellular adhesion molecule 1 0.00410 4 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00317 4 0
IL6R interleukin 6 receptor 0.00263 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00233 4 1
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00216 5 0
CARD9 caspase recruitment domain family, member 9 0.00197 4 0
IFIH1 interferon induced with helicase C domain 1 0.00168 4 0
CD27 CD27 molecule 0.00105 4 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00092 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00088 4 0
IL2RA interleukin 2 receptor, alpha 0.00086 5 1
PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 0.00057 5 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00054 4 0
ADAR adenosine deaminase, RNA-specific 0.00052 4 1
DDX58 DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 0.00052 4 0
C4A complement component 4A (Rodgers blood group) 0.00043 4 0
AIRE autoimmune regulator 0.00040 4 0
CFB complement factor B 0.00037 5 0
TNF tumor necrosis factor 0.00036 4 0
MAPK1 mitogen-activated protein kinase 1 0.00035 4 0
NOTCH1 notch 1 0.00027 4 0
HLA-B major histocompatibility complex, class I, B 0.00027 4 0
HLA-A major histocompatibility complex, class I, A 0.00025 5 0
B2M beta-2-microglobulin 0.00024 5 0
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00023 5 0
PTPN11 protein tyrosine phosphatase, non-receptor type 11 0.00019 4 0
CHUK conserved helix-loop-helix ubiquitous kinase 0.00017 4 0
IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 0.00015 5 0
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00014 4 0
IL12A interleukin 12A 0.00014 4 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00014 5 0
CD40 CD40 molecule, TNF receptor superfamily member 5 0.00013 5 0
STAT4 signal transducer and activator of transcription 4 0.00013 4 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00013 5 0
JAK2 Janus kinase 2 0.00012 5 0
IFNG interferon, gamma 0.00012 4 0
MALT1 MALT1 paracaspase 0.00011 4 0
IL2 interleukin 2 0.00011 4 0
IL1B interleukin 1, beta 0.00011 4 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00010 4 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00010 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ERAP1 endoplasmic reticulum aminopeptidase 1 0.01048 2 0
ICAM3 intercellular adhesion molecule 3 0.00662 2 0
IL23R interleukin 23 receptor 0.00590 3 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00545 2 0
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0.00543 2 1
FYN FYN proto-oncogene, Src family tyrosine kinase 0.00477 3 0
HLA-E major histocompatibility complex, class I, E 0.00416 2 0
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0.00252 3 1
NPEPPS aminopeptidase puromycin sensitive 0.00215 2 0
CACNA1S calcium channel, voltage-dependent, L type, alpha 1S subunit 0.00192 2 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.02378 1 0
NFKBIL1 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 0.02074 1 0
CAST calpastatin 0.01842 1 0
ATF6B activating transcription factor 6 beta 0.01621 0 0
TRAF3IP2 TRAF3 interacting protein 2 0.01321 1 0

Label the gene network with annotation prediction

The nodes/genes in Spondyloarthritis-specific gene network are color-coded by annotation additive predictor scores.

g <- SA_net
evi <- SA_combined[,3]
names(evi) <- rownames(SA_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SA_net
evi <- SA_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.5 Integration for Systemic Lupus Erythematosus

Recall: SLE genes prioritised by Pi are stored in the data frame SLE_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SLE_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SLE_genes iPS iGenes iNominated
SLE_combined <- cbind(SLE_genes[,c(1,4)], genes_ap)

The top 20 SLE-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
ATF6B 0.03584 0 0 0 0 0 0 0
NELFE 0.02596 0 0 0 0 0 0 0
VARS2 0.02149 0 0 0 0 0 0 1
ITGAM 0.02123 3 1 1 1 0 0 1
PBX2 0.01948 0 0 0 0 0 0 0
IRF5 0.01938 3 1 1 0 1 0 1
STAT4 0.01930 4 1 1 0 1 1 0
TNPO3 0.01708 1 1 0 0 0 0 0
CUEDC2 0.01616 0 0 0 0 0 0 0
HLA-DPB1 0.01481 2 1 0 0 1 0 0
RNF114 0.01319 0 0 0 0 0 0 0
CLIC1 0.01301 0 0 0 0 0 0 0
PSMB9 0.01287 3 0 0 1 1 1 0
C2 0.01216 2 0 0 1 1 0 0
NFKBIL1 0.01211 1 1 0 0 0 0 0
PYCARD 0.01192 2 0 0 1 1 0 0
LTA 0.01068 3 1 0 0 1 1 0
SKIV2L 0.01046 0 0 0 0 0 0 0
HNRNPM 0.00989 0 0 0 0 0 0 0
ITGAX 0.00955 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SLE-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SLE_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SLE_combined$priority,1) # genetic priority at the top
p <- ggplot(SLE_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("SLE-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 44 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 104 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 607 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to SLE (maybe other immune diseases).

The 44 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
STAT4 signal transducer and activator of transcription 4 0.01930 4 0
C4A complement component 4A (Rodgers blood group) 0.00630 4 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00622 4 0
CFB complement factor B 0.00517 5 0
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 0.00406 4 2
TYK2 tyrosine kinase 2 0.00226 4 0
IRF8 interferon regulatory factor 8 0.00189 4 0
IFIH1 interferon induced with helicase C domain 1 0.00160 4 0
IRF7 interferon regulatory factor 7 0.00155 4 0
CDKN1B cyclin-dependent kinase inhibitor 1B (p27, Kip1) 0.00134 4 0
HRAS Harvey rat sarcoma viral oncogene homolog 0.00111 4 0
ICAM1 intercellular adhesion molecule 1 0.00100 4 0
IL10 interleukin 10 0.00097 5 0
TNF tumor necrosis factor 0.00097 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00091 4 0
PNP purine nucleoside phosphorylase 0.00090 4 0
IL12A interleukin 12A 0.00079 4 0
MAPK1 mitogen-activated protein kinase 1 0.00074 4 0
CIITA class II, major histocompatibility complex, transactivator 0.00048 4 1
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00034 4 0
CHUK conserved helix-loop-helix ubiquitous kinase 0.00027 4 0
MASP2 mannan-binding lectin serine peptidase 2 0.00023 5 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00023 4 0
BCL2 B-cell CLL/lymphoma 2 0.00020 5 0
C1QA complement component 1, q subcomponent, A chain 0.00019 4 0
CARD9 caspase recruitment domain family, member 9 0.00018 4 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00017 4 1
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00016 5 0
MYD88 myeloid differentiation primary response 88 0.00016 4 0
IL12B interleukin 12B 0.00016 5 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00016 4 0
PDGFRA platelet-derived growth factor receptor, alpha polypeptide 0.00015 4 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00014 5 0
C3 complement component 3 0.00014 5 0
IL2RA interleukin 2 receptor, alpha 0.00013 5 1
PSTPIP1 proline-serine-threonine phosphatase interacting protein 1 0.00013 4 0
MEFV Mediterranean fever 0.00013 4 0
CREBBP CREB binding protein 0.00012 4 1
IL1B interleukin 1, beta 0.00012 4 0
TLR2 toll-like receptor 2 0.00012 4 0
IFNG interferon, gamma 0.00012 4 0
ITK IL2-inducible T-cell kinase 0.00012 5 0
NLRP3 NLR family, pyrin domain containing 3 0.00011 4 1
CD40 CD40 molecule, TNF receptor superfamily member 5 0.00011 5 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0.02123 3 1
IRF5 interferon regulatory factor 5 0.01938 3 1
HLA-DPB1 major histocompatibility complex, class II, DP beta 1 0.01481 2 0
PSMB9 proteasome (prosome, macropain) subunit, beta type, 9 0.01287 3 0
C2 complement component 2 0.01216 2 0
PYCARD PYD and CARD domain containing 0.01192 2 0
LTA lymphotoxin alpha 0.01068 3 0
STAT1 signal transducer and activator of transcription 1, 91kDa 0.00800 3 0
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0.00392 2 1
TNFSF4 tumor necrosis factor (ligand) superfamily, member 4 0.00374 2 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ATF6B activating transcription factor 6 beta 0.03584 0 0
NELFE negative elongation factor complex member E 0.02596 0 0
VARS2 valyl-tRNA synthetase 2, mitochondrial 0.02149 0 1
PBX2 pre-B-cell leukemia homeobox 2 0.01948 0 0
TNPO3 transportin 3 0.01708 1 0

Label the gene network with annotation prediction

The nodes/genes in SLE-specific gene network are color-coded by annotation additive predictor scores.

g <- SLE_net
evi <- SLE_combined[,3]
names(evi) <- rownames(SLE_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SLE_net
evi <- SLE_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

7 Cross-disease comparisons

7.1 Gene-level comparisons

Recall: disease-centric genetic prioritisation for genes by Pi are stored respectively in three data frames, that is, AS_genes for Ankylosing Spondylitis, SA_genes for Spondyloarthritis and SLE_genes for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, AS_genes$name)
df_AS <- AS_genes[ind,]
# Spondyloarthritis (SA)
ind <- match(iSymbol, SA_genes$name)
df_SA <- SA_genes[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, SLE_genes$name)
df_SLE <- SLE_genes[ind,]

# Combine into a data frame 'df_rank' and `df_priority`
## rank
df_rank <- cbind(AS=df_AS$rank, SA=df_SA$rank, SLE=df_SLE$rank)
## priority
df_priority <- cbind(AS=df_AS$priority, SA=df_SA$priority, SLE=df_SLE$priority)
rownames(df_rank) <- rownames(df_priority) <- iSymbol

Venn diagram of the top 200 genes across diseases

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.15)
grid.draw(vp)

The 14 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
ICAM1 intercellular adhesion molecule 1 4 0
IFIH1 interferon induced with helicase C domain 1 4 0
TYK2 tyrosine kinase 2 4 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 1 0
NFKBIL1 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 1 0
CDC37 cell division cycle 37 0 0
FDX1L ferredoxin 1-like 0 0
RAVER1 ribonucleoprotein, PTB-binding 1 0 0
RLTPR RGD motif, leucine rich repeats, tropomodulin domain and proline-rich containing 0 0
ZGLP1 zinc finger, GATA-like protein 1 0 0

Venn diagram of the top 200 genes in any diseases vs 714 TPN nominated genes

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
data$Prioritised <- base::Reduce(union, data[1:3])
data$Nominated <- iSymbol[which(iNominated>0)]
category.names <- c('Prioritised', 'Nominated')
vp <- venn.diagram(x=data[4:5], filename=NULL, fill=c("red","orange"), category.names=category.names, margin=0.15, cat.pos=1)
grid.draw(vp)

The 30 genes common to disease-specific genetic prioritisations and TPN nominations (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
PXK PX domain containing serine/threonine kinase 1 4
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 4 2
IL2RA interleukin 2 receptor, alpha 5 1
ADAR adenosine deaminase, RNA-specific 4 1
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
BLK BLK proto-oncogene, Src family tyrosine kinase 3 1
IRF5 interferon regulatory factor 5 3 1
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 2 1
BANK1 B-cell scaffold protein with ankyrin repeats 1 1 1
GPR65 G protein-coupled receptor 65 1 1
IL23A interleukin 23, alpha subunit p19 1 1
KAT8 K(lysine) acetyltransferase 8 1 1
SCNN1A sodium channel, non voltage gated 1 alpha subunit 1 1
SLC11A1 solute carrier family 11 (proton-coupled divalent metal ion transporter), member 1 1 1
SLC15A4 solute carrier family 15 (oligopeptide transporter), member 4 1 1
SLC5A2 solute carrier family 5 (sodium/glucose cotransporter), member 2 1 1
SPHK2 sphingosine kinase 2 1 1
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
BRWD1 bromodomain and WD repeat domain containing 1 0 1
CCDC101 coiled-coil domain containing 101 0 1
MBTPS2 membrane-bound transcription factor peptidase, site 2 0 1
ORAI3 ORAI calcium release-activated calcium modulator 3 0 1
PARP11 poly (ADP-ribose) polymerase family, member 11 0 1
RAB5A RAB5A, member RAS oncogene family 0 1
RIOK2 RIO kinase 2 0 1
SLC9A8 solute carrier family 9, subfamily A (NHE8, cation proton antiporter 8), member 8 0 1
SMPD2 sphingomyelin phosphodiesterase 2, neutral membrane (neutral sphingomyelinase) 0 1
TET3 tet methylcytosine dioxygenase 3 0 1
VARS2 valyl-tRNA synthetase 2, mitochondrial 0 1

7.2 Pathway-level comparisons

Recall: disease-centric genetic prioritisation for pathways by Pi are stored respectively in three data frames, that is, AS_pathways for Ankylosing Spondylitis, SA_pathways for Spondyloarthritis and SLE_pathways for Systemic Lupus Erythematosus as exemplars.

Load all pathways into a data frame iPathway:

org.Hs.egMsigdbC2CPall <- xRDataLoader(RData.customised='org.Hs.egMsigdbC2CPall', RData.location=RData.location)
iPathway <- org.Hs.egMsigdbC2CPall$set_info[, c(1,2)]
# Ankylosing Spondylitis (AS)
ind <- match(rownames(iPathway), rownames(AS_pathways))
df_AS <- AS_pathways[ind,]
# Spondyloarthritis (SA)
ind <- match(rownames(iPathway), rownames(SA_pathways))
df_SA <- SA_pathways[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(rownames(iPathway), rownames(SLE_pathways))
df_SLE <- SLE_pathways[ind,]

# Combine into a data frame 'df_adjp'
## adjusted p-values
df_adjp <- cbind(AS=df_AS$adjp, SA=df_SA$adjp, SLE=df_SLE$adjp)
rownames(df_adjp) <- iPathway$name

This barplot displays prioritised pathways for three diseases, in which horizontal lines are used to indicate three groups of pathways prioritised:

  • Top panel: pathways common to all diseases, including IL23-mediated signaling events, IL27-mediated signaling events.
  • Middle panel: pathways common to any two diseases.
  • Bottom panel: pathways unique to individual diseases.

7.3 Network-level comparisons

Recall: the gene network identified by Pi are stored respectively in three data frames, that is, AS_net for Ankylosing Spondylitis, SA_net for Spondyloarthritis and SLE_net for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, V(AS_net)$name)
df_AS <- V(AS_net)$name[ind]
# Spondyloarthritis (SA)
ind <- match(iSymbol, V(SA_net)$name)
df_SA <- V(SA_net)$name[ind]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, V(SLE_net)$name)
df_SLE <- V(SLE_net)$name[ind]

# Combine into a data frame 'df_net'
df_net <- cbind(AS=df_AS, SA=df_SA, SLE=df_SLE)
rownames(df_net) <- iSymbol

Venn diagram of network genes across diseases

data <- list()
data$AS <- iSymbol[!is.na(df_net[,1])]
data$SA <- iSymbol[!is.na(df_net[,2])]
data$SLE <- iSymbol[!is.na(df_net[,3])]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.2)
grid.draw(vp)

The common part of the gene network shared by AS, SA and SLE

# identify common parts of SLE-network and AS-network. 
net_AS_SA_SLE <- graph.intersection(AS_net, SA_net, SLE_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA_SLE
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The 6 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
ICAM1 intercellular adhesion molecule 1 4 0
TYK2 tyrosine kinase 2 4 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by three diseases are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by AS and SA

# identify common parts of AS-network and SA-network. 
net_AS_SA <- graph.intersection(AS_net, SA_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The common 21 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
SCNN1A sodium channel, non voltage gated 1 alpha subunit 1 1
IL12B interleukin 12B 5 0
IL6R interleukin 6 receptor 5 0
ICAM1 intercellular adhesion molecule 1 4 0
TYK2 tyrosine kinase 2 4 0
IL1R1 interleukin 1 receptor, type I 3 0
IL23R interleukin 23 receptor 3 0
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 3 0
NOS2 nitric oxide synthase 2, inducible 3 0
ERAP1 endoplasmic reticulum aminopeptidase 1 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
NPEPPS aminopeptidase puromycin sensitive 2 0
TBX21 T-box 21 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
IL12RB2 interleukin 12 receptor, beta 2 1 0
LNPEP leucyl/cystinyl aminopeptidase 1 0
CDC37 cell division cycle 37 0 0
MRPL4 mitochondrial ribosomal protein L4 0 0
RUNX3 runt-related transcription factor 3 0 0

The network nodes/genes shared by AS and SA are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by SLE and SA

The common g vcount(g) genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
ICAM1 intercellular adhesion molecule 1 4 0
TYK2 tyrosine kinase 2 4 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
SQSTM1 sequestosome 1 1 0
TRIM56 tripartite motif containing 56 1 0
CDC37 cell division cycle 37 0 0
ITGAX integrin, alpha X (complement component 3 receptor 4 subunit) 0 0
RPL34 ribosomal protein L34 0 0

The network nodes/genes shared by SLE and SA are color-coded by annotation additive predictor scores.

The common part of the gene network shared by SLE and AS

The common 7 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
ICAM1 intercellular adhesion molecule 1 4 0
TYK2 tyrosine kinase 2 4 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
SOCS1 suppressor of cytokine signaling 1 1 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by SLE and AS are color-coded by annotation additive predictor scores.

8 Benchmarking

Existing drug targets are stored in DrugBank.

Import drugs and targets

DrugBank <- xRDataLoader(RData.customised='DrugBank', RData.location=RData.location)

8.1 Ankylosing Spondylitis

# find AS approved or experimental drugs and their target genes
ind1 <- grep('ankylosing',DrugBank$indication, ignore.case=T, perl=T)
ind2 <- grep('approved|experimental',DrugBank$groups, ignore.case=T, perl=T)
ind <- intersect(ind1, ind2)
AS_targets <- unique(DrugBank[ind, c("GeneID","Symbol","description")])

The 30 AS existing drug target genes are:

GeneID Symbol description
231 AKR1B1 aldo-keto reductase family 1, member B1 (aldose reductase)
1645 AKR1C1 aldo-keto reductase family 1, member C1
712 C1QA complement component 1, q subcomponent, A chain
713 C1QB complement component 1, q subcomponent, B chain
714 C1QC complement component 1, q subcomponent, C chain
715 C1R complement component 1, r subcomponent
716 C1S complement component 1, s subcomponent
3577 CXCR1 chemokine (C-X-C motif) receptor 1
1909 EDNRA endothelin receptor type A
2209 FCGR1A Fc fragment of IgG, high affinity Ia, receptor (CD64)
2212 FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32)
2213 FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32)
9103 FCGR2C Fc fragment of IgG, low affinity IIc, receptor for (CD32) (gene/pseudogene)
3551 IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta
4790 NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1
4791 NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100)
4792 NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha
5467 PPARD peroxisome proliferator-activated receptor delta
5562 PRKAA1 protein kinase, AMP-activated, alpha 1 catalytic subunit
5563 PRKAA2 protein kinase, AMP-activated, alpha 2 catalytic subunit
5564 PRKAB1 protein kinase, AMP-activated, beta 1 non-catalytic subunit
5565 PRKAB2 protein kinase, AMP-activated, beta 2 non-catalytic subunit
5571 PRKAG1 protein kinase, AMP-activated, gamma 1 non-catalytic subunit
51422 PRKAG2 protein kinase, AMP-activated, gamma 2 non-catalytic subunit
53632 PRKAG3 protein kinase, AMP-activated, gamma 3 non-catalytic subunit
11251 PTGDR2 prostaglandin D2 receptor 2
5740 PTGIS prostaglandin I2 (prostacyclin) synthase
5742 PTGS1 prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)
5743 PTGS2 prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase)
6197 RPS6KA3 ribosomal protein S6 kinase, 90kDa, polypeptide 3

8.1.1 Parameter optimisation

9 References

Below is a list of references that this work stands on:

1000 Genomes Project Consortium. 2012. “An integrated map of genetic variation from 1,092 human genomes.” Nature 135 (V): 0–9. doi:10.1038/nature11632.

Ashburner, M, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, et al. 2000. “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.” Nat Genet 25 (1): 25–29. doi:10.1038/75556.

Cerami, E. G., B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander. 2011. “Pathway Commons, a web resource for biological pathway data.” Nucleic Acids Research 39 (Database): D685–D690. doi:10.1093/nar/gkq1039.

Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (March): 1246949. doi:10.1126/science.1246949.

Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.

Köhler, Sebastian, Sandra C Doelken, Christopher J Mungall, Sebastian Bauer, Helen V Firth, Isabelle Bailleul-Forestier, Graeme C M Black, et al. 2013. “The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.” Nucleic Acids Research, November, 1–9. doi:10.1093/nar/gkt1026.

Schriml, L M, C Arze, S Nadendla, Y W Chang, M Mazaitis, V Felix, G Feng, and W A Kibbe. 2012. “Disease Ontology: a backbone for disease semantic integration.” Nucleic Acids Res 40 (Database issue): D940–6. doi:10.1093/nar/gkr972.

Smith, C L, and J T Eppig. 2009. “The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis.” Wiley Interdiscip Rev Syst Biol Med 1 (3): 390–99. doi:10.1002/wsbm.44.

Szklarczyk, Damian, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-cepas, Milan Simonovic, et al. 2015. “STRING v10 : protein – protein interaction networks , integrated over the tree of life.” Nucleic Acids Res 43 (Database): D447–D452. doi:10.1093/nar/gku1003.

Welter, Danielle, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, et al. 2014. “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.” Nucleic Acids Research 42 (D1): 1001–6. doi:10.1093/nar/gkt1229.


  1. The wording a2maid comes from the mermaid (describing the maid of the sea having a fish’s tail), and instead of the maid of the sea, replacing with a2 turns to be the maid of our lab having two PhD students Anna Sanniti and Alicia Lledo Lara.